Optimal Sample Selection for Batch-mode Reinforcement Learning

نویسندگان

  • Emmanuel Rachelson
  • François Schnitzler
  • Louis Wehenkel
  • Damien Ernst
چکیده

We introduce the Optimal Sample Selection (OSS) meta-algorithm for solving discrete-time Optimal Control problems. This meta-algorithm maps the problem of finding a near-optimal closed-loop policy to the identification of a small set of one-step system transitions, leading to high-quality policies when used as input of a batch-mode Reinforcement Learning (RL) algorithm. We detail a particular instance of this OSS metaalgorithm that uses tree-based Fitted Q-Iteration as a batch-mode RL algorithm and Cross Entropy search as a method for navigating efficiently in the space of sample sets. The results show that this particular instance of OSS algorithms is able to identify rapidly small sample sets leading to high-quality policies.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Batch Reinforcement Learning for Spoken Dialogue Systems with Sparse Value Function Approximation

In this paper, we propose to combine sample-efficient generalization frameworks for RL with a feature selection algorithm for the learning of an optimal spoken dialogue system (SDS) strategy.

متن کامل

Batch mode reinforcement learning based on the synthesis of artificial trajectories

In this paper, we consider the batch mode reinforcement learning setting, where the central problem is to learn from a sample of trajectories a policy that satisfies or optimizes a performance criterion. We focus on the continuous state space case for which usual resolution schemes rely on function approximators either to represent the underlying control problem or to represent its value functi...

متن کامل

Iteratively Extending Time Horizon Reinforcement Learning

Reinforcement learning aims to determine an (infinite time horizon) optimal control policy from interaction with a system. It can be solved by approximating the so-called Q-function from a sample of four-tuples (xt, ut, rt, xt+1) where xt denotes the system state at time t, ut the control action taken, rt the instantaneous reward obtained and xt+1 the successor state of the system, and by deter...

متن کامل

Tree-Based Batch Mode Reinforcement Learning

Reinforcement learning aims to determine an optimal control policy from interaction with a system or from observations gathered from a system. In batch mode, it can be achieved by approximating the so-called Q-function based on a set of four-tuples (xt ,ut ,rt ,xt+1) where xt denotes the system state at time t, ut the control action taken, rt the instantaneous reward obtained and xt+1 the succe...

متن کامل

Adaptive Reactive Job-shop Scheduling with Reinforcement Learning Agents

Traditional approaches to solving job-shop scheduling problems assume full knowledge of the problem and search for a centralized solution for a single problem instance. Finding optimal solutions, however, requires an enormous computational effort, which becomes critical for large problem instance sizes and, in particular, in situations where frequent changes in the environment occur. In this ar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011